OcrV1, Main, Exploration, bibRecord, 000672

Subspace models for document script and language identification

Identifieur interne : 000672 ( Main/Exploration ); précédent : 000671; suivant : 000673

Subspace models for document script and language identification

Auteurs : T. N. Vikram [France] ; K. Chidananda Gowda [Inde]

Source :

International Journal of Imaging Systems and Technology [ 0899-9457 ] ; 2010-06.

RBID : ISTEX:18B9CF840974D4B8413EFE9142CB843C6F9BE104

English descriptors

KwdEn :
- 2DFLD, 2DPCA, OCR, document image processing, language identification, script identification, subspace models.

Abstract

In this article, we explore the suitability of subspace models like 2DPCA [Yang et al., IEEE Trans Pattern Anal Machine Intelligence 26 (2004), 131–137], 2DFLD [Yang et al., Pattern Recogn 38 (2005), 1125–1129], etc. for document script and language identification. They are employed to identify language and script at both paragraph and word level. Elaborate experimentation has been conducted which has revealed that they are robust enough to handle highly confusing scripts and their performance does not degrade drastically even in the presence of noise. A generic language identification has been attempted in this work, to identify languages of both Asian and European origin by considering a dataset of 20 different languages. © 2010 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 20, 140–148, 2010

Url:

https://api.istex.fr/document/18B9CF840974D4B8413EFE9142CB843C6F9BE104/fulltext/pdf

DOI: 10.1002/ima.20215

Affiliations:

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 000973
to stream Istex, to step Curation: 000962
to stream Istex, to step Checkpoint: 000252
to stream Main, to step Merge: 000677
to stream Main, to step Curation: 000672

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Subspace models for document script and language identification</title>
<author><name sortKey="Vikram, T N" sort="Vikram, T N" uniqKey="Vikram T" first="T. N." last="Vikram">T. N. Vikram</name>
</author>
<author><name sortKey="Gowda, K Chidananda" sort="Gowda, K Chidananda" uniqKey="Gowda K" first="K. Chidananda" last="Gowda">K. Chidananda Gowda</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:18B9CF840974D4B8413EFE9142CB843C6F9BE104</idno>
<date when="2010" year="2010">2010</date>
<idno type="doi">10.1002/ima.20215</idno>
<idno type="url">https://api.istex.fr/document/18B9CF840974D4B8413EFE9142CB843C6F9BE104/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000973</idno>
<idno type="wicri:Area/Istex/Curation">000962</idno>
<idno type="wicri:Area/Istex/Checkpoint">000252</idno>
<idno type="wicri:doubleKey">0899-9457:2010:Vikram T:subspace:models:for</idno>
<idno type="wicri:Area/Main/Merge">000677</idno>
<idno type="wicri:Area/Main/Curation">000672</idno>
<idno type="wicri:Area/Main/Exploration">000672</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Subspace models for document script and language identification</title>
<author><name sortKey="Vikram, T N" sort="Vikram, T N" uniqKey="Vikram T" first="T. N." last="Vikram">T. N. Vikram</name>
<affiliation wicri:level="3"><country xml:lang="fr">France</country>
<wicri:regionArea>GREYC, Université de Caen. 6, Boulevard du Maréchal Juin, 14050 CAEN CEDEX</wicri:regionArea>
<placeName><region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Basse-Normandie</region>
<settlement type="city">CAEN</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Gowda, K Chidananda" sort="Gowda, K Chidananda" uniqKey="Gowda K" first="K. Chidananda" last="Gowda">K. Chidananda Gowda</name>
<affiliation wicri:level="1"><country xml:lang="fr">Inde</country>
<wicri:regionArea>International School of Information Management, University of Mysore, 3004, “Udayaravi” 5th Main, 12th Cross V. V. Puram, Mysore, Karnataka</wicri:regionArea>
<wicri:noRegion>Karnataka</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">International Journal of Imaging Systems and Technology</title>
<title level="j" type="abbrev">Int. J. Imaging Syst. Technol.</title>
<idno type="ISSN">0899-9457</idno>
<idno type="eISSN">1098-1098</idno>
<imprint><publisher>Wiley Subscription Services, Inc., A Wiley Company</publisher>
<pubPlace>Hoboken</pubPlace>
<date type="published" when="2010-06">2010-06</date>
<biblScope unit="volume">20</biblScope>
<biblScope unit="issue">2</biblScope>
<biblScope unit="page" from="140">140</biblScope>
<biblScope unit="page" to="148">148</biblScope>
</imprint>
<idno type="ISSN">0899-9457</idno>
</series>
<idno type="istex">18B9CF840974D4B8413EFE9142CB843C6F9BE104</idno>
<idno type="DOI">10.1002/ima.20215</idno>
<idno type="ArticleID">IMA20215</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0899-9457</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>2DFLD</term>
<term>2DPCA</term>
<term>OCR</term>
<term>document image processing</term>
<term>language identification</term>
<term>script identification</term>
<term>subspace models</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="fr">In this article, we explore the suitability of subspace models like 2DPCA [Yang et al., IEEE Trans Pattern Anal Machine Intelligence 26 (2004), 131–137], 2DFLD [Yang et al., Pattern Recogn 38 (2005), 1125–1129], etc. for document script and language identification. They are employed to identify language and script at both paragraph and word level. Elaborate experimentation has been conducted which has revealed that they are robust enough to handle highly confusing scripts and their performance does not degrade drastically even in the presence of noise. A generic language identification has been attempted in this work, to identify languages of both Asian and European origin by considering a dataset of 20 different languages. © 2010 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 20, 140–148, 2010</div>
</front>
</TEI>
<affiliations><list><country><li>France</li>
<li>Inde</li>
</country>
<region><li>Basse-Normandie</li>
<li>Région Normandie</li>
</region>
<settlement><li>CAEN</li>
</settlement>
</list>
<tree><country name="France"><region name="Région Normandie"><name sortKey="Vikram, T N" sort="Vikram, T N" uniqKey="Vikram T" first="T. N." last="Vikram">T. N. Vikram</name>
</region>
</country>
<country name="Inde"><noRegion><name sortKey="Gowda, K Chidananda" sort="Gowda, K Chidananda" uniqKey="Gowda K" first="K. Chidananda" last="Gowda">K. Chidananda Gowda</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000672 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000672 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:18B9CF840974D4B8413EFE9142CB843C6F9BE104
   |texte=   Subspace models for document script and language identification
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Subspace models for document script and language identification

Subspace models for document script and language identification

Source :

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri